Thresholded Lasso for High Dimensional Variable Selection

نویسنده

  • Shuheng Zhou
چکیده

Given n noisy samples with p dimensions, where n " p, we show that the multi-step thresholding procedure based on the Lasso – we call it the Thresholded Lasso, can accurately estimate a sparse vector β ∈ R in a linear model Y = Xβ + ", where Xn×p is a design matrix normalized to have column #2-norm √ n, and " ∼ N(0,σIn). We show that under the restricted eigenvalue (RE) condition (BickelRitov-Tsybakov 09), it is possible to achieve the #2 loss within a logarithmic factor of the ideal mean square error one would achieve with an oracle while selecting a sufficiently sparse model – hence achieving sparse oracle inequalities; the oracle would supply perfect information about which coordinates are non-zero and which are above the noise level. We also show for the Gauss-Dantzig selector (Candès-Tao 07), ifX obeys a uniform uncertainty principle, one will achieve the sparse oracle inequalities as above, while allowing at most s0 irrelevant variables in the model in the worst case, where s0 ≤ s is the smallest integer such that for λ = √ 2 log p/n, ∑p i=1 min(β 2 i ,λ σ) ≤ s0λσ. Our simulation results on the Thresholded Lasso match our theoretical analysis excellently. keyword. Linear regression, Thresholded Lasso, Lasso, #1 regularization, #0 penalty, multiple-step procedure, Gauss-Dantzig Selector, ideal model selection, oracle inequalities, restricted orthonormality, Restricted Eigenvalue condition, statistical estimation, thresholding, linear sparsity, random matrices

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space

High-dimensional data analysis has motivated a spectrum of regularization methods for variable selection and sparse modeling, with two popular methods being convex and concave ones. A long debate has taken place on whether one class dominates the other, an important question both in theory and to practitioners. In this article, we characterize the asymptotic equivalence of regularization method...

متن کامل

Efficient Clustering of Correlated Variables and Variable Selection in High-Dimensional Linear Models

In this paper, we introduce Adaptive Cluster Lasso(ACL) method for variable selection in high dimensional sparse regression models with strongly correlated variables. To handle correlated variables, the concept of clustering or grouping variables and then pursuing model fitting is widely accepted. When the dimension is very high, finding an appropriate group structure is as difficult as the ori...

متن کامل

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)

Abstract: We revisit the adaptive Lasso as well as the thresholded Lasso with refitting, in a high-dimensional linear model, and study prediction error, lq-error (q ∈ {1, 2}), and number of false positive selections. Our theoretical results for the two methods are, at a rather fine scale, comparable. The differences only show up in terms of the (minimal) restricted and sparse eigenvalues, favor...

متن کامل

Thresholded Lasso for high dimensional variable selection and statistical estimation ∗

Given n noisy samples with p dimensions, where n ≪ p, we show that the multi-step thresholding procedure based on the Lasso – we call it the Thresholded Lasso, can accurately estimate a sparse vector β ∈ R in a linear model Y = Xβ + ǫ, where Xn×p is a design matrix normalized to have column l2 norm √ n, and ǫ ∼ N(0, σ2In). We show that under the restricted eigenvalue (RE) condition (Bickel-Rito...

متن کامل

Sure independence screening for ultrahigh dimensional feature space

High dimensionality is a growing feature in many areas of contemporary statistics. Variable selection is fundamental to high-dimensional statistical modeling. For problems of large or huge scale pn, computational cost and estimation accuracy are always two top concerns. In a seminal paper, Candes and Tao (2007) propose a minimum l1 estimator, the Dantzig selector, and show that it mimics the id...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010